Textual Paraphrase Dataset for Deep Language Modelling
نویسندگان
چکیده
Abstract The Turku Paraphrase Corpus is a dataset of over 100,000 Finnish paraphrase pairs. During the corpus creation, we strived to gather challenging pairs, more suitable test capabilities natural language understanding models. paraphrases are both selected and classified manually, so as minimise lexical overlap, provide examples that structurally lexically different maximum extent. An important distinguishing feature most pairs extracted distributed in their native document context, rather than isolation. primary application for development evaluation deep models, representation learning general.
منابع مشابه
Paraphrase Substutution for Recognizing Textual Entailment
We describe a method for recognizing textual entailment that uses the length of the longest common subsequence (LCS) between two texts as its decision criterion. Rather than requiring strict word matching in the common subsequences, we perform a flexible match using automatically generated paraphrases. We find that the use of paraphrases over strict word matches represents an average F-measure ...
متن کاملParaphrase and Textual Entailment Generation
One particular information can be conveyed by many different sentences. This variety concerns the choice of vocabulary and style as well as the level of detail (from laconism or succinctness to total verbosity). Although verbosity in written texts is considered bad style, generated verbosity can help natural language processing (NLP) systems to fill in the implicit knowledge. The paper presents...
متن کاملParaphrase Substitution for Recognizing Textual Entailment
We describe a method for recognizing textual entailment that uses the length of the longest common subsequence (LCS) between two texts as its decision criterion. Rather than requiring strict word matching in the common subsequences, we perform a flexible match using automatically generated paraphrases. We find that the use of paraphrases over strict word matches represents an average F-measure ...
متن کاملParaphrase and Textual Entailment Generation in Czech
Paraphrase and textual entailment generation can support natural language processing (NLP) tasks that simulate text understanding, e.g., text summarization, plagiarism detection, or question answering. A paraphrase, i.e., a sentence with the same meaning, conveys a certain piece of information with new words and new syntactic structures. Textual entailment, i.e., an inference that humans will j...
متن کاملParaphrase and Textual Entailment Recognition and Generation
Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also tr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Cognitive technologies
سال: 2022
ISSN: ['2197-6635', '1611-2482']
DOI: https://doi.org/10.1007/978-3-031-17258-8_27